Lec01 - Mon 2/13: Introduction

Course Title

  • In catalog: Introduction to Statistical Sciences
  • New: Introduction to Statistical and Data Sciences

What is Data Science?

Data Science

  • Example domains: biology, economics, physics, sociology, etc.
  • So why the title switch?

Dialogue with Student

Course Objective #1

Have students engage in the data/science research pipeline in as faithful a manner as possible while maintaining a level suitable for novices.

  • Cobb: Minimizing prerequisites to research
  • Not necessarily publishing in top journals, but answering scientific questions with data.
  • Difficult to do research without understanding stats, however

Data/Science Research Pipeline

We will, as best we can, perform all this:

Data/Science Research Pipeline

And not just this, as in many previous intro stats courses:

Course Objective #2

Foster a conceptual understanding of statistical topics and methods using simulation/resampling and real data whenever possible, rather than mathematical formulae.

  • Whenever we can, use real data
  • Example data set: nycflights13
  • There are two “engines” that can make statistics “work”
    • Mathematics: formulas, approximations, etc
    • Computers: simulations, random number generation

The “Engine” of Statistics

In this course, computers and not math will be the “engine”. What does this mean?

  • Less of this:
    Drawing
  • But more of this:
    Drawing

Programming/Coding

  • Previous programming/coding experience is not a prerequisite to this course
  • This course is not an explicit course on programming, coding, nor computer science. But we will use some elements.
  • Also you will be exposed to basic algorithmic thinking and computational logic
  • Learning R is like learning a foreign language: its really hard at first!

Two Simple Rules of Learning Code

  • Computers are stupid!
  • When learning, take existing code that works, and tweak it!

Course Objective #3

Blur the traditional lecture/lab dichotomy of introductory statistics courses by incorporating more computational and algorithmic thinking into the syllabus.

  • Completely separate lecture and labs is a legacy of a time before
    Drawing

RStudio Server

  • Not all laptops are created equal: operating system, processing power, age
  • RStudio Server: cloud-based version of RStudio where all processing is done on Middlebury servers
  • go/rstudio/ (on campus or via VPN)

Course Objective #5

Develop statistical literacy by, among other ways, tying in the curriculum to current events, demonstrating the importance statistics plays in society.

  • H.G. Wells (paraphrased): “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.”
  • Me: “Sure, it’s easy to lie with statistics. But it’s also hard to tell the truth without them.”

Final Project

  • Capstone experience to align this topics and principles of this course with how research and learning is done in practice.
  • Work on interpersonal and collaborative skills. No textbook on that!

Lecture Format

Either

  • Lab format: With laptop
    • You sit in groups of 4
    • I’ll talk for 10-15 minutes before you work on learning checks
  • Chalk talk: Old-school
    • Keep desk in rows
    • More traditional lecture format

Let’s Build our Toolbox

R, RStudio, and DataCamp

  • R: Software behind the scenes i.e. the engine
  • RStudio: Intergrated development environment i.e. the interface
  • DataCamp: Browser-based learning tool i.e. the driver’s ed teacher

Analogy

R RStudio DataCamp
Drawing Drawing Drawing

Test Drive RStudio

  • Login to go/rstudio/ with your Midd account
  • If you don’t have access, raise your hand. (Username: guest1, password: rstudioguest)
  • In RStudio menu bar -> File -> New File -> R Script

The Four Panels

  1. Console: Crunch numbers in R
  2. Files, Packages, Help: See your files, install packages, help files
  3. Editor: Where you’ll write code and save it
  4. Environment: Your workspace

Important: Console

  • This is where you run/execute commands
  • The “>” is the prompt. It means R is ready to receive commands
  • If you don’t see a “>” and want to restart, press ESC.

Switching Gears

Now we will use R via DataCamp instead of via RStudio, but just for driver’s ed. Two panels exist in both:

  1. Editor panel: Where you write code
  2. Console panel: Where you will execute code